Skip to content

(improvement) deserializers: use direct PyUnicode_DecodeUTF8/ASCII from C buffer pointer#8

Open
mykaul wants to merge 27 commits into
masterfrom
perf/direct-utf8-decode
Open

(improvement) deserializers: use direct PyUnicode_DecodeUTF8/ASCII from C buffer pointer#8
mykaul wants to merge 27 commits into
masterfrom
perf/direct-utf8-decode

Conversation

@mykaul

@mykaul mykaul commented Mar 13, 2026

Copy link
Copy Markdown
Owner

Summary

  • Replace the two-step to_bytes(buf).decode('utf8') pattern in DesUTF8Type and DesAsciiType with direct CPython C API calls (PyUnicode_DecodeUTF8 and PyUnicode_DecodeASCII)
  • Eliminates an intermediate bytes object allocation per text cell
  • Text (UTF8Type/VarcharType) is the most common CQL column type

Benchmark Results

Cython row parsing pipeline, median times:

Scenario Before (original) After (direct decode) Speedup
UTF8 1row x 1col short (11B) 565 ns 454 ns 1.24x
UTF8 1row x 10col short 1,594 ns 1,023 ns 1.56x
UTF8 100rows x 5col medium 61,396 ns 28,766 ns 2.13x
UTF8 1000rows x 5col medium 547,145 ns 290,361 ns 1.88x
UTF8 100rows x 5col long (200B) 57,940 ns 35,680 ns 1.62x
UTF8 100rows x 5col multibyte 125,149 ns 103,370 ns 1.21x
ASCII 100rows x 5col medium 41,608 ns 35,817 ns 1.16x
ASCII 1000rows x 5col medium 416,350 ns 374,341 ns 1.11x
Mixed 100rows 3text+2int 44,646 ns 31,189 ns 1.43x

Tests

All existing unit tests pass (62 type tests, 116 total across key suites). Includes 7 new correctness tests covering empty strings, multibyte UTF-8, long strings, NULL values, and ASCII.

@mykaul mykaul force-pushed the perf/direct-utf8-decode branch from cbb8652 to 97eb4e8 Compare March 15, 2026 15:56
sylwiaszunejko and others added 25 commits March 17, 2026 23:47
Introduce the data layer for Private Link client routes support:

- ClientRoutesChangeType enum for CLIENT_ROUTES_CHANGE event types
- ClientRouteProxy dataclass and ClientRoutesConfig for user-facing
  configuration
- _Route frozen dataclass for immutable route records
- _RouteStore for thread-safe route storage with atomic update/merge
  and preferred route selection that avoids unnecessary connection_id
  migration when multiple routes exist for the same host
Add _ClientRoutesHandler which manages the full lifecycle of dynamic
address translation via system.client_routes:

- initialize(): loads all routes at startup and on control connection
  reconnect
- handle_client_routes_change(): processes CLIENT_ROUTES_CHANGE events
  with targeted merge or full refresh depending on event data
- _query_all_routes_for_connections(): complete refresh query using
  connection_id IN (...)
- _query_routes_for_change_event(): targeted query grouping by
  connection_id with host_id IN (...) per group
- _execute_routes_query(): common query execution and result parsing
  with proxy address override support
- resolve_host(): host_id to (address, port) resolution with DNS lookup
- ClientRoutesEndPointFactory: creates endpoints from system.peers rows
  by extracting host_id, deferring address translation and DNS resolution
  until connection time
- ClientRoutesEndPoint: endpoint that resolves via _ClientRoutesHandler
  on each connection attempt, ensuring immediate reaction to route changes
  and CLIENT_ROUTES_CHANGE events
Cluster:
- Add client_routes_config parameter with mutual exclusivity check
  against endpoint_factory
- Create _ClientRoutesHandler and ClientRoutesEndPointFactory when
  client_routes_config is provided

ControlConnection:
- Register CLIENT_ROUTES_CHANGE event watcher when handler is present
- Forward events to handler via _handle_client_routes_change
- Trigger full route re-read on control connection reconnection
Cover ClientRouteEntry/ClientRoutesConfig validation, _RouteStore
get/merge operations, _ClientRoutesHandler initialization,
ClientRoutesEndPoint resolution with and without route mappings,
and SSL check_hostname rejection with client_routes_config.
Add comprehensive integration tests covering:
- TCP proxy and NLB emulator infrastructure for simulating
  private link connectivity
- query_routes filtering with different connection/host ID combinations
- Full private-link connectivity verifying all driver connections
  go exclusively through the NLB proxy
- Dynamic route updates via REST API with driver reconnection
  through new proxy ports
Recently scylladb started to rely on the options "--auth-superuser-name"
and "--auth-superuser-salted-password" to ensure that a
cassandra/cassandra user exists for tests - without those options
a default superuser no longer exists.
…ames

Skip the regex scanner and stack-based parser in parse_casstype_args()
when the type string has no parentheses. For simple types like
'AsciiType' or 'org.apache.cassandra.db.marshal.FloatType', go directly
to lookup_casstype_simple() which is just a prefix strip + dict lookup.

This avoids re.Scanner, re.split on ':' / '=>', int() try/except, and
list-of-lists stack manipulation for the common case of non-parameterized
types.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
The time.sleep(10) in setup_keyspace() is redundant because callers
already ensure the cluster is fully ready before calling it:
- use_cluster() calls start_cluster_wait_for_up() which uses
  wait_for_binary_proto=True + wait_other_notice=True, then
  wait_for_node_socket() per node
- External cluster path (wait=False) had no sleep anyway

Remove the wait parameter entirely and its associated sleep, saving 10s
per cluster startup.
Replace fixed sleeps with condition-based polling to speed up tests:

- simulacron/utils.py: replace 5s sleep with HTTP endpoint polling
  (max 15s timeout, typically <1s)
- test_authentication.py: replace 10s sleep with auth readiness poll
  that tries connecting with default credentials
- upgrade/__init__.py: replace 10s auth sleep with same polling pattern
- upgrade/test_upgrade.py: replace 3x 20s sleeps (60s total) with
  control connection readiness polling

Total potential saving: ~95s of unconditional waiting per test run.
Replace fixed sleeps with condition-based polling in four test files:

- test_shard_aware.py: replace 25s of sleeps (5+10+5+5) with
  wait_until_not_raised polling for reconnection after shard connection
  close and iptables blocking
- test_metrics.py: replace 15s of sleeps (5+5+5) with polling for
  cluster recovery and node-down detection
- test_tablets.py: replace 13s of sleeps (3+10) with polling for
  metadata refresh and decommission completion
- simulacron/test_connection.py: replace 20s of sleeps (10+10) with
  polling for quiescent pool state

Total potential saving: ~73s of unconditional waiting.
… for invalidation

The tablet tests were intermittently failing because:
1. get_query_trace() used the default 2s max_wait, which is too short
   under resource pressure (--smp 2). Increased to 10s.
2. test_tablets_invalidation_decommission_non_cc_node used a fixed
   time.sleep(2) hoping tablet metadata invalidation would complete.
   Replaced with wait_until polling for the tablet record to be purged
   (0.5s delay, 20 attempts = 10s budget).
- test_cluster.py: replace sleep(1) x10 iterations with
  connect(wait_for_all_pools=True) for deterministic pool readiness
- test_query.py: replace sleep(5) with wait_until polling for
  'Preparing all known prepared statements' log message
- test_connection.py: replace sleep(2) with wait_until polling for
  host_down listener notification
…superuser config

Use set_configuration_options() (the Python API behind `ccm updateconf`) to
set auth_superuser_name and auth_superuser_salted_password directly in the
YAML config instead of passing them via the SCYLLA_EXT_OPTS environment
variable.
…ema_agreement

min(self._timeout, total_timeout - elapsed) raises TypeError when
control_connection_timeout is set to None, which is explicitly
documented as a supported value (meaning no timeout). Guard the
min() call so that when self._timeout is None, we use only the
remaining schema agreement wait time.
Patch reactor.running to False in setUp() so that maybe_start() always
enters the branch that spawns the reactor thread. Without this, leaked
global reactor state from prior tests can leave reactor.running as True,
causing maybe_start() to skip thread creation and the reactor.run mock
to never be called — making the assertion in test_connection_initialization
fail intermittently.

Observed in CI on PyPy 3.11 + macOS x86 (Rosetta 2), where timing
differences make the reactor state leak more likely.
The column kind filter at line 2744 used 'clustering_key' but
system_schema.columns uses 'clustering' as the kind value. This caused
clustering columns to not be excluded from the 'other columns' loop,
resulting in them being processed twice (once as clustering key, once
as regular column). The correct value 'clustering' was already used
6 lines above in the clustering key extraction loop.
ScyllaDB doesn't support triggers, so skip the triggers query when
connected to ScyllaDB. This is detected by checking if the connection
has shard awareness (using the existing _is_not_scylla() method).

Changes to both SchemaParserV3 and SchemaParserV4:
- Modified _query_all() to conditionally append triggers query only for non-ScyllaDB
- Modified _query_all() response unpacking to use array slicing for cleaner code
- Modified get_table() in V3 to conditionally query triggers

This eliminates unnecessary failed queries to system_schema.triggers on ScyllaDB.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
- Fix spelling: 'tring' → 'string' in docstring
- Remove extra 't' at end of comment
- Refactor complex list comprehension for clarity
- Use 'is None' instead of '== None' for None comparison

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
@mykaul mykaul force-pushed the perf/direct-utf8-decode branch from 97eb4e8 to a9815c1 Compare April 2, 2026 14:30
mykaul added 2 commits April 2, 2026 20:07
…om C buffer pointer

Replace the two-step to_bytes(buf).decode('utf8') pattern in DesUTF8Type and
DesAsciiType with direct CPython C API calls (PyUnicode_DecodeUTF8 and
PyUnicode_DecodeASCII). This eliminates an intermediate bytes object
allocation per text cell — the old code created a Python bytes object from
the C buffer pointer via to_bytes(buf), then immediately decoded it to str
and discarded the bytes.

Text (UTF8Type/VarcharType) is the most common CQL column type, so this
optimization applies to the majority of cells in typical workloads.

Benchmark results (Cython row parsing pipeline, median times):

| Scenario                        | Before (original) | After (direct decode) | Speedup |
|---------------------------------|-------------------:|----------------------:|--------:|
| UTF8 1row x 1col short (11B)   |             565 ns |                454 ns |   1.24x |
| UTF8 1row x 10col short        |           1,594 ns |              1,023 ns |   1.56x |
| UTF8 100rows x 5col medium     |          61,396 ns |             28,766 ns |   2.13x |
| UTF8 1000rows x 5col medium    |         547,145 ns |            290,361 ns |   1.88x |
| UTF8 100rows x 5col long(200B) |          57,940 ns |             35,680 ns |   1.62x |
| UTF8 100rows x 5col multibyte  |         125,149 ns |            103,370 ns |   1.21x |
| ASCII 100rows x 5col medium    |          41,608 ns |             35,817 ns |   1.16x |
| ASCII 1000rows x 5col medium   |         416,350 ns |            374,341 ns |   1.11x |
| Mixed 100rows 3text+2int       |          44,646 ns |             31,189 ns |   1.43x |

All existing unit tests pass (62 type tests, 116 total across key suites).
…uards, add invalid-input tests

- Replace hand-rolled try/except ImportError with the project-standard
  cythontest decorator and HAVE_CYTHON conditional imports, so
  VERIFY_CYTHON=True CI mode fails loudly instead of silently skipping.
- Add pytest.importorskip guards to the benchmark file so it skips
  gracefully when pytest-benchmark or Cython extensions are missing.
- Add test_utf8_invalid_bytes and test_ascii_invalid_bytes to confirm
  error propagation through the DriverException wrapper.
@mykaul mykaul force-pushed the perf/direct-utf8-decode branch from a9815c1 to f0ce46c Compare April 2, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants